Mitogenome sequences of domestic cats demonstrate lineage expansions and dynamic mutation processes in a mitochondrial minisatellite

Background As a population genetic tool, mitochondrial DNA is commonly divided into the ~ 1-kb control region (CR), in which single nucleotide variant (SNV) diversity is relatively high, and the coding region, in which selective constraint is greater and diversity lower, but which provides an informative phylogeny. In some species, the CR contains variable tandemly repeated sequences that are understudied due to heteroplasmy. Domestic cats (Felis catus) have a recent origin and therefore traditional CR-based analysis of populations yields only a small number of haplotypes. Results To increase resolution we used Nanopore sequencing to analyse 119 cat mitogenomes via a long-amplicon approach. This greatly improves discrimination (from 15 to 87 distinct haplotypes in our dataset) and defines a phylogeny showing similar starlike topologies within all major clades (haplogroups), likely reflecting post-domestication expansion. We sequenced RS2, a CR tandem array of 80-bp repeat units, placing RS2 array structures within the phylogeny and increasing overall haplotype diversity. Repeat number varies between 3 and 12 (median: 4) with over 30 different repeat unit types differing largely by SNVs. Five SNVs show evidence of independent recurrence within the phylogeny, and seven are involved in at least 11 instances of rapid spread along repeat arrays within haplogroups. Conclusions In defining mitogenome variation our study provides key information for the forensic genetic analysis of cat hair evidence, and for the first time a phylogenetically informed picture of tandem repeat variation that reveals remarkably dynamic mutation processes at work in the mitochondrion. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-023-09789-1.


Supplementary Methods: Investigation of RS2 region sequence variation
In order to obtain long-read sequences spanning the entirety of the RS elements, one of three different barcoded versions (CTGAGGTGATCAG, ATCCGGTCGGAGA or AGTGTCCTGCTAG) of primers 57F (CCCATCTCAGGCATTATTG) and 3R (TTTAAGCTACATTAACTGGGA), consisting of the same 13-bp 5´ sequence in both the forward and reverse primer, were used to amplify coordinates 16136-881 of the mitogenomes of the 26 cats which had only been sequenced using Illumina technology.Amplification was performed in a 10 µl volume, containing 10 ng of template DNA, 0.3 µM of each primer, 0.9 µl of "11.1xBuffer" [1], and 0.06 µl of a 20:1 mix of PCRBIO Taq Polymerase (PCRBiosystems; 5 U/µl): Pfu DNA polymerase (Agilent Technologies; 2.5 U/µl).
PCR cycling conditions consisted of an initial denaturation at 96°C for 1 min, followed by 28 cycles of denaturation at 96°C for 20 s, annealing at 56°C for 30 s, extension at 68°C for 2 min, and a final extension at 68°C for 25 min.PCR products were quantified by agarose gel electrophoresis, and equimolar concentrations of products from cats with different combinations of forward and reverse barcodes, were pooled together.The twelve pools were purified with a 1.8x volume of AppMag PCR Clean Up Beads (Appleton Woods Ltd.) following the manufacturer's instructions, and purified products quantified using the dsDNA HS Assay kit with the Qubit 2.0 fluorometer.
After DNA library preparation and sequencing (see main text), Guppy (v.5.0.16) was used for basecalling with Q-score filtering disabled.Demultiplexing of native barcodes was also carried out using Guppy, filtering for the presence of barcodes on both ends of each read.
MiniBar (v 0.21) [2] was used for custom barcode demultiplexing based on the presence of the custom barcode within the first and last 150 bp of a given read, with these sequences then trimmed from that read.Up to 11 differences in the primer sequences and two differences in the barcode sequences were allowed.Reads with a quality score of <11 and shorter than 1000 and longer than 3000 bp were removed using NanoFilt (v2.8.0) [3].The resulting reads were mapped to the domestic cat reference mtDNA sequence (NC_001700) using minimap2 (v2.17) [4], with the mapping files converted to BAM files using SAMtools (v1.13) [5].
At this stage, the filtered, un-subsampled ~9-kb reads from the 93 long-range (Nanoporesequenced) cats and the 26 cats sequenced above had been aligned to the reference genome.Using the BAM files, SAMtools ampliconclip was used to clip reads using the options --hard-clip --both-ends --filter-len 0. Sequences spanning beyond coordinates 16136 -16955 bp for the Flongle-sequenced cats, and coordinates 15591-16955 bp for the MinION-sequenced cats, were clipped.The BAM files were converted back to FASTQ files using BEDTools (v2.28.0) [6].Read lengths of the FASTQ files were binned by length separated by ~80 bp.The major length species of each cat was used as input with Amplicon_sorter [7] which generated a consensus sequence.If more than one consensus sequence was generated per cat sample, the sequence with the greatest number of supporting reads was retained.
In addition, for a small subset of cats a segment stretching across the RS2 region was amplified and the PCR products visualised by agarose gel electrophoresis.PCR amplifications were carried out in a 10 µl volume, which contained 10 ng of template DNA, 0.3 µM of each of the 58F (CATATATTGCATATACCCATACTG) and 60R (GCTGAGTCATAGCAAGTTCG) primers, 0.9 µl of 11.1x Buffer [1], and 0.06 µl of a 20:1 mix of PCRBIO Taq Polymerase (PCRBiosystems; 5U/µl): Pfu DNA polymerase (Agilent Technologies; 2.5 U/µl).Amplification conditions consisted of an initial denaturation at 96°C for 1 min, followed by 28 cycles at 96°C for 20 s, 62°C for 30 s, 68°C for 1 min, and a final extension at 68°C for 25 min.